Search CORE

3,699 research outputs found

Gaussian approximation for the sup-norm of high-dimensional matrix-variate U-statistics and its applications

Author: Chen Xiaohui
Publication venue
Publication date: 30/09/2016
Field of study

This paper studies the Gaussian approximation of high-dimensional and non-degenerate U-statistics of order two under the supremum norm. We propose a two-step Gaussian approximation procedure that does not impose structural assumptions on the data distribution. Specifically, subject to mild moment conditions on the kernel, we establish the explicit rate of convergence that decays polynomially in sample size for a high-dimensional scaling limit, where the dimension can be much larger than the sample size. We also supplement a practical Gaussian wild bootstrap method to approximate the quantiles of the maxima of centered U-statistics and prove its asymptotic validity. The wild bootstrap is demonstrated on statistical applications for high-dimensional non-Gaussian data including: (i) principled and data-dependent tuning parameter selection for regularized estimation of the covariance matrix and its related functionals; (ii) simultaneous inference for the covariance and rank correlation matrices. In particular, for the thresholded covariance matrix estimator with the bootstrap selected tuning parameter, we show that the Gaussian-like convergence rates can be achieved for heavy-tailed data, which are less conservative than those obtained by the Bonferroni technique that ignores the dependency in the underlying data distribution. In addition, we also show that even for subgaussian distributions, error bounds of the bootstrapped thresholded covariance matrix estimator can be much tighter than those of the minimax estimator with a universal threshold

arXiv.org e-Print Archive

A Note on Moment Inequality for Quadratic Forms

Author: Chen Xiaohui
Publication venue
Publication date: 07/05/2014
Field of study

Moment inequality for quadratic forms of random vectors is of particular interest in covariance matrix testing and estimation problems. In this paper, we prove a Rosenthal-type inequality, which exhibits new features and certain improvement beyond the unstructured Rosenthal inequality of quadratic forms when dimension of the vectors increases without bound. Applications to test the block diagonal structures and detect the sparsity in the high-dimensional covariance matrix are presented.Comment: 12 pages, 0 figur

arXiv.org e-Print Archive

A robust bootstrap change point test for high-dimensional location parameter

Author: Chen Xiaohui
Yu Mengjia
Publication venue
Publication date: 13/10/2021
Field of study

We consider the problem of change point detection for high-dimensional distributions in a location family when the dimension can be much larger than the sample size. In change point analysis, the widely used cumulative sum (CUSUM) statistics are sensitive to outliers and heavy-tailed distributions. In this paper, we propose a robust, tuning-free (i.e., fully data-dependent), and easy-to-implement change point test that enjoys strong theoretical guarantees. To achieve the robust purpose in a nonparametric setting, we formulate the change point detection in the multivariate

U

-statistics framework with anti-symmetric and nonlinear kernels. Specifically, the within-sample noise is canceled out by anti-symmetry of the kernel, while the signal distortion under certain nonlinear kernels can be controlled such that the between-sample change point signal is magnitude preserving. A (half) jackknife multiplier bootstrap (JMB) tailored to the change point detection setting is proposed to calibrate the distribution of our

\ell^{\infty}

-norm aggregated test statistic. Subject to mild moment conditions on kernels, we derive the uniform rates of convergence for the JMB to approximate the sampling distribution of the test statistic, and analyze its size and power properties. Extensions to multiple change point testing and estimation are discussed with illustration from numerical studies

arXiv.org e-Print Archive

Inference in Kingman's Coalescent with Particle Markov Chain Monte Carlo Method

Author: Chen Yifei
Xie Xiaohui
Publication venue
Publication date: 03/05/2013
Field of study

We propose a new algorithm to do posterior sampling of Kingman's coalescent, based upon the Particle Markov Chain Monte Carlo methodology. Specifically, the algorithm is an instantiation of the Particle Gibbs Sampling method, which alternately samples coalescent times conditioned on coalescent tree structures, and tree structures conditioned on coalescent times via the conditional Sequential Monte Carlo procedure. We implement our algorithm as a C++ package, and demonstrate its utility via a parameter estimation task in population genetics on both single- and multiple-locus data. The experiment results show that the proposed algorithm performs comparable to or better than several well-developed methods

arXiv.org e-Print Archive

Randomized incomplete $U$ -statistics in high dimensions

Author: Chen Xiaohui
Kato Kengo
Publication venue
Publication date: 27/01/2019
Field of study

This paper studies inference for the mean vector of a high-dimensional

U

-statistic. In the era of Big Data, the dimension

d

of the

U

-statistic and the sample size

n

of the observations tend to be both large, and the computation of the

U

-statistic is prohibitively demanding. Data-dependent inferential procedures such as the empirical bootstrap for

U

-statistics is even more computationally expensive. To overcome such computational bottleneck, incomplete

U

-statistics obtained by sampling fewer terms of the

U

-statistic are attractive alternatives. In this paper, we introduce randomized incomplete

U

-statistics with sparse weights whose computational cost can be made independent of the order of the

U

-statistic. We derive non-asymptotic Gaussian approximation error bounds for the randomized incomplete

U

-statistics in high dimensions, namely in cases where the dimension

d

is possibly much larger than the sample size

n

, for both non-degenerate and degenerate kernels. In addition, we propose generic bootstrap methods for the incomplete

U

-statistics that are computationally much less-demanding than existing bootstrap methods, and establish finite sample validity of the proposed bootstrap methods. Our methods are illustrated on the application to nonparametric testing for the pairwise independence of a high-dimensional random vector under weaker assumptions than those appearing in the literature

arXiv.org e-Print Archive

Jackknife multiplier bootstrap: finite sample approximations to the $U$ -process supremum with applications

Author: Chen Xiaohui
Kato Kengo
Publication venue
Publication date: 13/02/2019
Field of study

This paper is concerned with finite sample approximations to the supremum of a non-degenerate

U

-process of a general order indexed by a function class. We are primarily interested in situations where the function class as well as the underlying distribution change with the sample size, and the

U

-process itself is not weakly convergent as a process. Such situations arise in a variety of modern statistical problems. We first consider Gaussian approximations, namely, approximate the

U

-process supremum by the supremum of a Gaussian process, and derive coupling and Kolmogorov distance bounds. Such Gaussian approximations are, however, not often directly applicable in statistical problems since the covariance function of the approximating Gaussian process is unknown. This motivates us to study bootstrap-type approximations to the

U

-process supremum. We propose a novel jackknife multiplier bootstrap (JMB) tailored to the

U

-process, and derive coupling and Kolmogorov distance bounds for the proposed JMB method. All these results are non-asymptotic, and established under fairly general conditions on function classes and underlying distributions. Key technical tools in the proofs are new local maximal inequalities for

U

-processes, which may be useful in other problems. We also discuss applications of the general approximation results to testing for qualitative features of nonparametric functions based on generalized local

U

-processes

arXiv.org e-Print Archive

Finite sample change point inference and identification for high-dimensional mean vectors

Author: Chen Xiaohui
Yu Mengjia
Publication venue: 'Wiley'
Publication date: 02/01/2021
Field of study

Cumulative sum (CUSUM) statistics are widely used in the change point inference and identification. For the problem of testing for existence of a change point in an independent sample generated from the mean-shift model, we introduce a Gaussian multiplier bootstrap to calibrate critical values of the CUSUM test statistics in high dimensions. The proposed bootstrap CUSUM test is fully data-dependent and it has strong theoretical guarantees under arbitrary dependence structures and mild moment conditions. Specifically, we show that with a boundary removal parameter the bootstrap CUSUM test enjoys the uniform validity in size under the null and it achieves the minimax separation rate under the sparse alternatives when the dimension

p

can be larger than the sample size

n

. Once a change point is detected, we estimate the change point location by maximizing the

\ell^{\infty}

-norm of the generalized CUSUM statistics at two different weighting scales corresponding to covariance stationary and non-stationary CUSUM statistics. For both estimators, we derive their rates of convergence and show that dimension impacts the rates only through logarithmic factors, which implies that consistency of the CUSUM estimators is possible when

p

is much larger than

n

. In the presence of multiple change points, we propose a principled bootstrap-assisted binary segmentation (BABS) algorithm to dynamically adjust the change point detection rule and recursively estimate their locations. We derive its rate of convergence under suitable signal separation and strength conditions. The results derived in this paper are non-asymptotic and we provide extensive simulation studies to assess the finite sample performance. The empirical evidence shows an encouraging agreement with our theoretical results

arXiv.org e-Print Archive

Inference of high-dimensional linear models with time-varying coefficients

Author: Chen Xiaohui
He Yifeng
Publication venue
Publication date: 16/03/2017
Field of study

We propose a pointwise inference algorithm for high-dimensional linear models with time-varying coefficients. The method is based on a novel combination of the nonparametric kernel smoothing technique and a Lasso bias-corrected ridge regression estimator. Due to the non-stationarity feature of the model, dynamic bias-variance decomposition of the estimator is obtained. With a bias-correction procedure, the local null distribution of the estimator of the time-varying coefficient vector is characterized for iid Gaussian and heavy-tailed errors. The limiting null distribution is also established for Gaussian process errors, and we show that the asymptotic properties differ between short-range and long-range dependent errors. Here, p-values are adjusted by a Bonferroni-type correction procedure to control the familywise error rate (FWER) in the asymptotic sense at each time point. The finite sample size performance of the proposed inference algorithm is illustrated with synthetic data and an application to learn brain connectivity by using the resting-state fMRI data for Parkinson's disease

arXiv.org e-Print Archive

Hanson-Wright inequality in Hilbert spaces with application to $K$ -means clustering for non-Euclidean data

Author: Chen Xiaohui
Yang Yun
Publication venue
Publication date: 07/07/2020
Field of study

We derive a dimension-free Hanson-Wright inequality for quadratic forms of independent sub-gaussian random variables in a separable Hilbert space. Our inequality is an infinite-dimensional generalization of the classical Hanson-Wright inequality for finite-dimensional Euclidean random vectors. We illustrate an application to the generalized

K

-means clustering problem for non-Euclidean data. Specifically, we establish the exponential rate of convergence for a semidefinite relaxation of the generalized

K

-means, which together with a simple rounding algorithm imply the exact recovery of the true clustering structure

arXiv.org e-Print Archive

Distributed Consensus Resilient to Both Crash Failures and Strategic Manipulations

Author: Bei Xiaohui
Chen Wei
Zhang Jialin
Publication venue
Publication date: 06/06/2012
Field of study

In this paper, we study distributed consensus in synchronous systems subject to both unexpected crash failures and strategic manipulations by rational agents in the system. We adapt the concept of collusion-resistant Nash equilibrium to model protocols that are resilient to both crash failures and strategic manipulations of a group of colluding agents. For a system with

n

distributed agents, we design a deterministic protocol that tolerates 2 colluding agents and a randomized protocol that tolerates

n - 1

colluding agents, and both tolerate any number of failures. We also show that if colluders are allowed an extra communication round after each synchronous round, there is no protocol that can tolerate even 2 colluding agents and 1 crash failure

arXiv.org e-Print Archive